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(57) ABSTRACT 

There is disclosed, for use in an x86-compatible processor, 
an interface circuit for synchronizing the transfer of signals 
between different clock domains derived from a common 
core clock, where the phase and frequency relationships 
between the different domain clocks are known. The inter- 
face circuit comprises 1) a first latch having a data input for 
receiving a data signal from the first clock domain, a clock 
input for receiving the first clock signal, and an output; 2) a 
second latch having a data input coupled to the first latch 
output, an enable input for receiving a gating signal, a clock 
input for receiving the first clock signal, and an output; 3) a 
third latch having a data input for receiving the data signal, 
an enable input for receiving a gating signal, a clock input 
for receiving the first clock signal, and an output; and 4) a 
multiplexer having a first data input coupled to the second 
latch output, a second data input coupled to the third latch 
output, and a selector input for selecting one of the first data 
input and the second data input for transfer to an output of 
the multiplexer. 

20 Claims, 6 Drawing Sheets 
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LOW-LATENCY CIRCUIT FOR 
SYNCHRONIZING DATA TRANSFERS 
BETWEEN CLOCK DOMAINS DERIVED 
FROM A COMMON CLOCK 

5 

CROSS-REFERENCE TO RELATED 
APPLICATION 

The present invention is related to that disclosed in U.S. 
patent application Ser. No. 09/477,488, filed concurrently 
herewith, entitled ALOW LATENCY CLOCK DOMAIN 
SYNCHRONIZATION CIRCUIT AND METHOD OF 
OPERATION. The above application is commonly assigned 
to the assignee of the present invention. The disclosure of the 
related patent application is hereby incorporated by refer- 
ence for all purposes as if fully set forth herein. 

TECHNICAL FIELD OF THE INVENTION 

The present invention is directed, in general, to micro- 
processors and, more specifically, to synchronization cir- 20 
cuits for transferring data between two different clock 
domains controlled by a processing device. 

BACKGROUND OF THE INVENTION 

The ever-growing requirement for high performance com- 25 
puters demands that state-of-the-art microprocessors 
execute instructions in the minimum amount of time. Over 
the years, efforts to increase microprocessor speeds have 
followed different approaches, including increasing the 
speed of the clock that drives the processor and reducing the 30 
number of clock cycles required to perform a given instruc- 
tion. 

Microprocessor speeds may also be increased by reducing 
the number of gate delays incurred while executing an 35 
operation. Under this approach, the microprocessor is 
designed so that each data bit or control signal propagates 
through the smallest possible number of gates when per- 
forming an operation. Additionally, the propagation delay, 
through each individual gate is also minimized in order to ^ 
further reduce the end-to-end propagation delay associated 
with transmitting a control signal or a data bit during the 
execution of an instruction. 

One area where it is important to minimize propagation 
delays occurs at the interface between clock domains. Con- 45 
ventional microprocessors contain many clock signals that 
are derived from a basic high-frequency core clock. The core 
clock signal may be divided down to produce clock signals 
that are related, for example, by an N:l ratio or by an 
(N+2):l ratio. For instance, dividing the core clock by two 50 
and dividing the core clock by four yields two clock signals 
that are in a 2:1 ratio. Similarly, dividing the core clock by* 
two and dividing the core clock by five yields two clock 
signals that are in a 2.5:1 ratio. These different clock domain 
signals may drive internal microprocessor components or 55 
may be brought off-chip to drive external devices, such as 
main memory, input/output (I/O) buses, and the like. 

At the interface between two clock domains, there is no 
guarantee that a signal transmitted from a first clock domain 
will be synchronized with the clock in a second clock 60 
domain. Normally, synchronization between different clock 
domains is handled by a set of synchronizing flip-flops. A 
signal in a first clock domain is first registered in a flip-flop 
in the first clock domain. The output of that first flip-flop is 
then Adouble sampled® by two flip-flop in the second clock 65 
domain. Double sampling means that the output of the first 
flip-flop feeds the input of a second flip-flop clocked in the 
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second clock domain. The output of the second flip-flop 
feeds the input of a third flip-flop that also is clocked in the 
second clock domain. The output of this third flip-flop is 
properly synchronized with the second clock domain. An 
identical three flip-flop interface circuit is used to synchro- 
nize signals that are being transmitted in the reverse direc- 
tion (i.e., from the second clock domain to the first clock 
domain). This synchronizing circuit, along with grey code 
encoding of multi-bit signals provides a means for synchro- 
nizing two asynchronous clock domains. 

The chief drawback of the above-described flip-flop inter- 
face circuit is the fact that there are three gate propagation 
delays involved in transmitting a signal from one clock 
domain to another clock domain. This necessarily slows 
down the operation of the microprocessor and/or an external 
device communicating wit the microprocessor, since the 
circuits in the receiving domain receive the transmitted 
signal only after at least three propagation delays. 

Therefore, there is a need in the art for improved micro- 
processor designs that maximize the throughput of a pro- 
cessor and any external devices communicating with the 
processor. In particular, there is a need in the art for 
improved circuits that interface signals between different 
clock domains. More particularly, there is a need for inter- 
face circuits that minimize the number of gate delays that 
affect a signal being transmitted from a faster clock domain 
to a slower clock domain, and vice versa. 

SUMMARY OF THE INVENTION 

The limitations inherent in the prior art described above 
are overcome by the present invention, which provides an 
interface circuit for synchronizing the transfer of data 
through an output port from a first clock domain driven by 
a first clock signal to a second clock domain driven by a 
second clock signal. In an advantageous embodiment, the 
interface circuit comprises 1) a first latch having a data input 
for receiving a data signal from the first clock domain, and 
enable input for receiving an enabling signal, a clock input 
for receiving the first clock signal, and an output; 2) a second 
latch having a data input coupled to the first latch output, a 
clock input for receiving a gating signal, a clock input for 
receiving the first clock signal, and an output; 3) a third latch 
having a data input for receiving the data signal, and enable 
input for receiving a phase sel3ect signal, a clock input for 
receiving the first clock signal, and an output; and 4) a 
multiplexer having a first data input coupled to the second 
latch output, a second data input coupled to the third latch 
output, and a selector input for selecting one of the first data 
input and the second data input for transfer to an output of 
the multiplexer. 

According to one embodiment of the present invention, 
the second clock signal and the first clock signal are derived 
from a common core clock. 

According to another embodiment of the present 
invention, a frequency of the second clock signal and a 
frequency of the first clock signal are in a ratio of N:l where 
N is an integer. 

According to still another embodiment of the present 
invention, a selection signal applied to the selector input 
selects the first data input of the multiplexer when a rising 
edge of the first clock signal is approximately in phase with 
a rising edge of the second clock signal. 

According to yet another embodiment of the present 
invention, a frequency of the second clock signal and a 
frequency of the first clock signal are in a ratio of (N+2):l 
where N is an integer. 
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According to a further embodiment of the present 
invention, a selection signal applied to the selector input 
selects the first data input of the multiplexer during one 
clock cycle of the second clock signal. 

The present invention may also be embodied as an 5 
interface circuit for synchronizing the transfer of data from 
an output of a state machine in a first clock domain driven 
by a first clock signal to a second clock domain driven by a 
second clock signal. In an advantageous embodiment, the 
state machine interface circuit comprises 1) a first latch ]0 
having a data input for receiving the state machine output, 
a clock input for receiving the first clock signal, and an 
output; and 2) a second latch having a data input coupled to 
the first latch output, a clock input for receiving a gating 
signal, and an output coupled to an input of the state 
machine. 35 

According to one state machine interface embodiment of 
the present invention, the second clock signal and the first 
clock signal are derived from a common core clock. 

According to another state machine interface embodiment 
of the present invention, a frequency of the second clock 20 
signal and a frequency of the first clock signal are in a ratio 
of N:l where N is an integer. 

According to still another state machine interface embodi- 
ment of the present invention, a frequency of the second 
clock signal and a frequency of the first clock signal are in 25 
a ratio of (N+2):l where N is an integer. 

The foregoing has outlined rather broadly the features and 
technical advantages of the present invention so that those 
skilled in the art may better understand the detailed descrip- 
tion of the invention that follows. Additional features and 30 
advantages of the invention will be described hereinafter 
that form the subject of the claims of the invention. Those 
skilled in the art should appreciate that they may readily use 
the conception and the specific embodiment disclosed as a 
basis for modifying or designing other structures for carry- 35 
ing out the same purposes of the present invention. Those 
skilled in the art should also realize that such equivalent 
constructions do not depart from the spirit and scope of the 
invention in its broadest form. 

Before undertaking the DETAILED DESCRIPTION, it 
may be advantageous to set forth definitions of certain words 
and phrases used throughout this patent document: the terms 
Ainclude@ and Acomprise,@ as well as derivatives thereof, 
mean inclusion without limitation; the term Aor,@ is 
inclusive, meaning and/or; the phrases Aassociated with® 
and Aassociated therewith,® as well as derivatives thereof, 45 
may mean to include, be included within, interconnect with, 
contain, be contained within, connect to or with, couple to 
or with, be communicable with, cooperate with, interleave, 
juxtapose, be proximate to, be bound to or with, have, have 
a property of, or the like; and the term Aeon trailer® means 50 
any device, system or part thereof that controls at least one 
operation, such a device may be implemented in hardware, 
firmware or software, or some combination of at least two of 
the same. It should be noted that the functionality associated 
with any particular controller may be centralized or 55 
distributed, whether locally or remotely. Definitions for 
certain words and phrases are provided throughout this 
patent document, those of ordinary skill in the art should 
understand that in many, if not most instances, such defini- 
tions apply to prior, as well as future uses of such defined 60 
words and phrases. 

BRIEF DESCRIPTION OF THE DRAWINGS 
For a more complete understanding of the present 
invention, reference is now made to the following descrip- 65 
tions taken in conjunction with the accompanying drawings, 
in which: 



FIG. 1 is a block diagram of an exemplary integrated 
processor system, including an integrated microprocessor in 
accordance with the principles of the present invention; 

FIG. 2 illustrates in more detail the exemplary integrated 
microprocessor in FIG. 1 in accordance with one embodi- 
ment of the present invention; 

FIG. 3 is a schematic diagram of a synchronization circuit 
for synchronizing the output of a state machine to a clock 
domain; 

FIG. 4 is a schematic diagram of a synchronization circuit 
for synchronizing the transfer of data between two asyn- 
chronous clock domains; 

FIG. 5 is a timing diagram illustrating the operations of 
the synchronization circuits illustrated in FIGS. 3 and 4 in 
accordance with an exemplary embodiment of the present 
invention; and 

FIG. 6 is a timing diagram illustrating the operations of 
the synchronization circuits illustrated in FIGS. 3 and 4 in 
accordance with an exemplary embodiment of the present 
invention. 

DETAILED DESCRIPTION 

FIGS. 1 through 6, discussed below, and the various 
embodiments used to describe the principles of the present 
invention in this patent document are by way of illustration 
only and should hot be construed in any way to limit the 
scope of the invention. Those skilled in the art will under- 
stand that the principles of the present invention may be 
implemented in any suitably arranged integrated micropro- 
cessor. 

Integrated Processor System 

FIG. 1 is a block diagram of an exemplary integrated 
processor system, including integrated processor 100 in 
accordance with the principles of the present invention. 
Integrated microprocessor 100 includes central processing 
unit (CPU) 110, which has dual integer and dual floating 
point execution units, separate load/store and branch units, 
and LI instruction and data caches. Integrated onto the 
microprocessor die is graphics unit 120, system memory 
controller 130, and L2 cache 140, which is shared by CPU 
110 and graphics unit 120. Bus interface unit 150 interfaces 
CPU 110, graphics unit 120, and L2 cache 140 to memory 
controller 130, 

Integrated memory controller 130 bridges processor 100 
to system memory 160, and may provide data compression 
and/or decompression to reduce bus traffic over external 
memory bus 165 which preferably, although not exclusively, 
has a RAMbusJ, fast SDRAM or other type protocol. 
Integrated graphics unit 120 provides TFT, DSTN, RGB, 
and other types of video output to drive display 180. 

Bus interface unit 150 interfaces, through I/O interface 
152, processor 100 to chipset bridge 190 for conventional 
peripheral bus 192 connection (e.g., PCI connection) to 
peripherals, such as sound card 194, LAN controller 195, 
and disk drive 196, as well as fast serial link 198 (e.g., IEEE 
1394 "firewire" bus and/or universal serial bus "USB") and 
relatively slow I/O port 199 for peripherals, such as a 
keyboard and/or a mouse. Alternatively, chipset bridge 160 
may integrate local bus functions such as sound, disk drive 
control, modem, network adapter, etc. 
Integrated CPU 

FIG. 2 illustrates in more detail the exemplary integrated 
processor 100, including CPU 110, which is integrated with 
graphics controller 120, memory controller 130, and L2 
unified cache 140 (e.g., 256 KB in size). CPU 110 includes 
an execution pipeline with instruction decode/dispatch logic 
200 and functional units 250. 
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Instruction decode/dispatch logic 200 decodes variable unit. Once in the reservation stations, the nodes complete 

length x86 instructions into nodes (operations) each con- execution out-of-order. 

taining source, destination, and control logic. Each instruc- The dual EX0/EX1 (integer) units 255 are pipelined with 
tion maps into one or more nodes, which are formed into separate copies of a physical register file, and execute and 
checkpoints for issue in parallel to functional units 250. The 5 forward results in a single cycle. The dual FPU0/FPU1 units 
exemplary execution pipeline includes dual integer units 260 include dual execution units (with separate FP physical 
(EX) 255, dual pipelined floating point units (FP) 260, register files) that support MMX and 3DNow instructions, as 
load/store unit (LDST) 265, and branch unit (BR) 270. wel1 ™ standard x87 floating point, instruction execution. 
Hence, a single checkpoint can include up to 2 EX, 2 FP, 1 . a P'Pf FAd <kr and FPU1 includes a 
LDST, and 1 BR nodes which can be issued in parallel. LI no Fmulupler, both supporting packed SIMD opera- 
data cache (DC) 280 (e.g., 16 KB in size) receives data tl0 1 ns ; , . , _„ . L L 
r»m.»ctc frJL tti t hct „„;. a „H i~ ^ „J„ «f * 1 1 k * Integer multiply operations are issued to FPU1 with the 
requests from the LDST unit and, in the case of an LI hit, Fmul|i licr> and inleger divide operations m ^ as 

supplies the requested data to appropriate EX or FP unit. £ nodes t0 both 6 FPU0 and ^ &Q tha{ im EX 

BR unit 270 executes branch operations based on flag operations can execule m par aUel with integer multiplies and 

results from the EX units. Predicted (taken/not-taken) and 15 divideSt Results are forwarded between EX0/EX1 and 

not-predicted (undetected) branches are resolved (mis- FPU0/FPU1 in a single cycle. 

predictions incur, for example, a 12 clock penalty) and LDST unit 265 executes memory reference operations as 

branch information is supplied to BTB 275, including loads/stores to/from data cache 280 (or L2 cache 140). 

branch address, target address, and resolution (taken or not LDST unit 265 performs pipelined linear address calculation 

taken). BTB 275 includes a 1 KB target cache, a 7-bit history 20 and physical (paged) address translation, followed by data 

and prediction ROM, and a 16-entry return stack. cache access with the physical (translated) address. Address 

Instruction decode/dispatch logic 200 includes LI instruc- translations are performed in order using a two-level TLB 

tion cache (IC) 210 (e.g., 16 KB in size) which stores structure (a 32 entry LI data TLB and the 256 entry shared 

32-byte cache lines (8 d words/4 qwords). Each fetch L2 TLB). Up to four pending LI misses can be outstanding, 

operation, fetch unit 215 fetches a cache line of 32 instruc- 25 Missed data returns out of order (from either L2 cache 140 

tion bytes from the LI instruction cache to aligner logic 220. or system memory 160). 

Fetch unit 215 either (a) generates a fetch address by Exemplary 16 KB LI instruction cache 210 is single- 
incrementing the previous fetch address (sequential fetch) ported 4-way associative, with 2 pending misses. Exemplary 
or, (b) if the previous fetch address hit in BTB 275, switches 16 KB LI data cache 280 is non-blocking, dual-ported (one 
the code stream by supplying the fetch address for the cache 30 load port and one store/fill port), 4-way associative, with 4 
line containing the target address provided by BTB 275. pending misses. Both LI caches are indexed with the linear 
Fetch unit 215 supplies a linear address simultaneously to address and physically tagged with the TLB (translated) 
LI instruction cache 210 and BTB 275. A two-level trans- address. In response to LI misses, L2 cache 140 transfers an 
lation look-aside buffer (TLB) structure (a 32-entry LI entire cache line (32 bytes/256 bits) in one cycle with a 7 
instruction TLB and a 256-entry shared L2 TLB) supplies a 35 clock access latency for LI misses that hit in L2 cache 140. 
corresponding physical address to the LI cache to complete Exemplary 256 KB L2 cache 140 is 8-way associative and 
cache access. 8-way interleaved. Each interleave supports one LI (code/ 

Aligner logic 220 identifies up to two x86 variable length data) miss per cycle, and either one LI store or one L2 fill 

instructions per clock. Instructions are buffered in instruc- per cycle. Portions or all of 2 of the 8 ways may be locked 

tion buffer 225, along with decode and issue constraints. 40 down for use by graphics controller 120. 

Decoder 230 transfers instructions from the instruction For integer register-to-register operations, the execution 

buffer to the appropriate one (as determined by decode pipeline is eleven (11) stages from code fetch to completion: 

constraints stored with the instruction) of decoders DO, Dl, two cache access stages (I CI and IC2), two alignment stages 

and Useq (a microsequencer). DO and Dl define two decode (AL1 and AL2), three decode/rename stages 

slots (or paths) SO and SI, with the Useq decoder feeding 45 (DECO-DEC2), checkpoint issue stage (ISS), and reserva- 

nodes into both slots simultaneously. tion stage (RS), followed by the execute and result write - 

DO and Dl each decode single node EX/FPU/BR instruc- back/forward stages (EX and WB). For integer register- 

tions that do not involve memory references (e.g., register- memory operations, the LDST unit pipeline adds an 

register integer and floating point operations and branch additional four stages between RS and EX: address calcu- 

operations), while memory reference instructions, which 50 lation (AC), translation (XL), and data cache access and 

decode into separate EX/FP and LDST nodes (e.g., register- drive back DC and DB. The floating point adder pipeline 

memory integer and floating point operations), are con- comprises four stages and the floating point multiply pipe- 

strained to DO. The Useq decoder handles instructions that line comprises five stages. 

decode into more than two nodes/operations (e.g., far calls/ Different functional blocks in integrated processor 100 
returns, irets, segment register loads, floating point divides, 55 may operate at different clock speeds. Each group of circuits 
floatingpoint transcendentals). Each such sequence of nodes that are driven at a specified clock speed is referred to as a 
are organized into one or more separate checkpoints issued clock domain. As described above in the Background, 
in order to the functional units. Renaming logic 235 special synchronization circuitry is needed to transfer data 
(including a logical-to-physical map table) renames sources from one clock domain to another clock domain. However, 
and destinations for each node, mapping logical to physical 60 because all of the clock domains in integrated processor 100 
registers. are derived from a common core clock, the phase and 
Issue logic 240 organizes the renamed nodes from each frequency relationships between the different clock domains 
slot into checkpoints that are scheduled for issue in order to are known. The present invention use knowledge of the 
the functional units. Most instructions can be dual issued phase and frequency relationships between clock domains to 
with the nodes for each in the same checkpoint. Up to 16 65 provide unique synchronization circuits that minimize the 
checkpoints may be active (i.e., issued to functional units). number of gates and clock delays encountered when trans- 
Nodes are issued into reservation stations in each functional ferring data from one domain to another domain. 



11/05/2003, EAST Version: 1.4.1 



US 6,535,946 Bl 



8 



FIG. 3 is a schematic diagram of exemplary synchroni- 
zation circuit 300 for synchronizing the output of a state 
machine to a clock domain. Exemplary synchronization 
circuit 300 comprises latch 302, latch 304, inverter 306, 
inverter 307, AND gate 308, and state machine logic circuit 
310. The data input (D) of latch 302 is connected to the 
Anext state® output (NEXT) of state machine logic circuit 
310, and the enable input (EN) of latch 302 is permanently 
connected to a Logic 1 enabling signal. Latch 302 transfers 
NEXT to its Q output on the rising edge of CLK. 

The output of latch 302 is connected to the data (D) input 
of latch 304. Inverters 306 and 307 invert the CLK signal. 
The inverted CLK signal is one input to AND gate 308. The 
other input of AND gate 308 receives the PHASE signal. 
The output of AND gate 308 is a gated clock signal that is 
Ahigh@ (or Logic 1) when inverted CLK and PHASE are 
both high. The output of AND gate 308 is connected to the 
enable (EN) input of latch 304. Latch 304 transfers the 
clocked output of latch 302 to the Q output of latch 304 on 
the rising edge of the inverted CLK signal from inverter 307, 
providing an output which is synchronized with clock 
domain of the CLK domain. The Q output of latch 304 
represents the current state (CURRENT) which is connected 
as the input to Logic circuit 310. Since the CURRENT input 
to state machine logic circuit 310 is synchronized with the 
CLK signal, the NEXT output of state machine logic circuit 
310 is also synchronized with the CLK signal. 

FIG. 4 is a schematic diagram of exemplary synchroni- 
zation circuit 400 for synchronizing the transfer of data 
between two asynchronous clock domains. Synchronization 
circuit 400 transfers the DATA signal off-chip to another 
circuit connected to pin 430. Latches 402, 404, and 410 and 
multiplexer 412 form synchronizing circuit for an input data 
signal, labeled ADATA@ in FIG. 4. Latches 420, 422, and 
424 and multiplexer 426 form a synchronizing circuit for an 
input data enable signal, labeled ADATA ENABLE® in 
FIG. 4. Inverter 406 and AND gate 408 provide a gated 
inverted clock signal for use by both synchronizing circuit 
groups. Inverter 428 and tri -state driver 414 provide means 
for transferring synchronized data during the high level of 40 
the DATA ENABLE signal from multiplexer 412 to pin 430. 

Latch 402 transfers the DATA signal from input D to 
output Q on the rising edge of the CLK signal. The enable 
(EN) input to latch 402 is connected to Logic 1. The output 
Q of latch 402 is connected to input D of latch 404. Inverter 
406 inverts CLK and supplies inverted CLK as an input to 
AND gate 408. The other input of AND gate 408 receives 
the signal labeled APHASE@ in FIG. 4. The inverted CLK 
output from AND gate 408 is supplied as the enable (EN) 
input for latches 404, 410, 422, and 424. 

Inverter 407 inverts the CLK signal and clocks latch 404. 
Latch 404 transfers the output of latch 402 to its output Q on 
the rising edge of the output from inverter 407. In a similar 
manner, latch 410 transfers the DATA signal from its D input 
to its Q output on the rising edge of the output of inverter 
407. The output of latches 404 and 410 are provided as data 
inputs to multiplexer 412. The phase-select signal, labeled 
APHASE SELECT® in FIG. 4 selects one of the two data 
inputs of multiplexers 412 and 426. Thus, multiplexer 412 
transfers the output of latch 404 to its output when PHASE 
SELECT is high and multiplexer 412 transfers the output of 
latch 410 to its output when PHASE SELECT is low. 

The output of multiplexer 412 is connected to the non- 
inverting input of tri-state driver 414. Inverter 428 inverts 
the output from multiplexer 426 and provides this as the 
inverted input to tri-state driver 414. Tri-state driver 414 
transfers the output of multiplexer 412 to its output when the 



output of inverter 428 is low (Logic 0). Thus, tri-state driver 
414 transfers the output of multiplexer 412 to pin 430 when 
the output of multiplexer 426 is high. Otherwise, the tri-state 
driver 414 provides a high impedance to pin 430. 
5 As previously described, the synchronizing circuit com- 
posed of latches 420, 422, and 424, and multiplexer 426 
operates in the same manner as previously described for the 
DATA signal, except that the DATA ENABLE signal is 
transferred in place of the DATA signal. The Q outputs of 
10 latches 422 and 424 are provided as inputs to multiplexer 
426, with the PHASE SELECT signal controlling the output 
of multiplexer 426. Multiplexer 426 transfers the output of 
latch 422 to inverter 428 when PHASE SELECT is high and 
transfers the output of latch 424 to inverter 428 when 
is PHASE SELECT is low. As previously discussed, tri-state 
driver 414 provide means for transferring the DATA signal 
from multiplexer 412 to pin 430 during the high level of 
DATA ENABLE signal from multiplexer 426. 

FIG. 5 is a timing diagram illustrating the operations of 
20 the synchronization circuits illustrated in FIGS. 3 and 4 in 
accordance with an exemplary embodiment of the present 
invention. The timing diagram shows the signals: CLOCK 
(labeled ACLK® in FIGS. 3 and 4), 2:1 CLOCK, PHASE, 
PHASE SELECT, DATA, DATA ENABLE, PIN-OUT, 
25 NEXT STATE, and STATE (labeled ACURRENT® in FIG. 
3). 

CLOCK is square wave in which high and low intervals 
(or pulses) are sequentially numbered. Even numbers rep- 
resent the low pulses of CLOCK and odd numbers represent 
30 the high pulses of CLOCK. An even and odd numbered pair 
of adjacent pulses represents a single cycle for CLOCK. The 
2:1 CLOCK time line represents a clock signal which is 
running at half the rate of CLOCK. For this example, 2:1 
CLOCK transitions to high or low when CLOCK transitions 
35 from low to high. The time line for PHASE depicts an 
inverse relationship to the 2:1 CLOCK time line (i.e., high 
when 2: 1 CLOCK is low and low when 2: 1 CLOCK is high). 
For the purposes of this example, PHASE SELECT is shown 
as always high. 

The DATA signal is only transferred to the output of 
multiplexer 412 when the PHASE signal is high. During 
pulses 3 and 4 (i.e., one cycle of CLOCK), the DATA signal 
goes low when PHASE is high. At the same time, during 
pulses 3 and 4 (i.e., one cycle of CLOCK), the DATA 
ENABLE signal goes high and is clocked through to tri-state 
driver 414. Thus, the DATA signal is driven through to 
PIN-OUT which goes from high to low. Subsequently, 
during pulses 5 through 12, the DATA signal goes high 
again. However, the PHASE does not go high again until 
50 pulses 7 and 8. During pulses 7 and 8, the high DATA signal 
is driven through latch 404 and multiplexer 412 to tri-state 
driver 414. Since DATA ENABLE signal is still held high by 
latch 422, tri-state driver 414 is still enabled. Thus, the 
DATA signal is driven through to PIN-OUT, which goes 
55 from low to high. Another exemplary pulse of the DATA 
signal is driven through to PIN OUT during pulses 17-20. 

In FIG. 3, the output of latch 304, labeled ACURRENT® 
in FIG. 3 and ASTATE® in FIG. 5, can only change when 
PHASE is high and CLOCK is low (i.e., pulses 4, 8, 12, 16, 
60 etc.). Thus, STATE transitions to State 0 during pulse 4, to 
State 1 during pulse 8, to State 2 during pulse 12, and finally 
back to State 0 during pulse 16. 

FIG. 6 is a timing diagram illustrating the operations of 
the synchronization circuits illustrated in FIGS. 3 and 4 in 
65 accordance with an exemplary embodiment of the present 
invention. For this example, CLOCK is 2.5 times faster than 
5:2 CLOCK, with the positive transition of 5:2 CLOCK 
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coinciding with the beginning of every fifth ha If -cycle of 
CLOCK. The PHASE signals high interval always begins 
and ends with a falling edge of CLOCK and it remains high 
for one CLOCK cycle. PHASE SELECT essentially repre- 
sents a 5: 1 CLOCK which makes its transitions on the rising 5 
edge of the 5:2 CLOCK. In other words, PHASE SELECT 
cycles at half the rate of 5:2 CLOCK and one fifth the rate 
of CLOCK. 

As in FIG. 5, the DATA signal is only transferred to the 
output of multiplexer 412 when the PHASE signal is high. 
Latches 404 and 410 are clocked and transfer data from 
input to output when PHASE is high and CLOCK is low. 
Latches 422 and 424 are clocked by the inverted CLOCK 
and transfer data from input to output when PHASE 
SELECT is high. PHASE SELECT is used to select the 
output of multiplexers 412 and 426 so that the PIN OUT 15 
signal is synchronized to the domain of the 5:2 clock signal. 

Although the present invention has been described in 
detail, those skilled in the art should understand that they can 
make various changes, substitutions and alterations herein 
without departing from the spirit and scope of the invention 20 
in its broadest form. 

What is claimed is: 

1. An interface circuit for synchronizing the transfer of 
data through an output port from a first clock domain driven 
by a first clock signal to a second clock domain driven by a 2 s 
second clock signal, the interface circuit comprising: 

a first latch having a data input for receiving a data signal 
from said first clock domain, an enable input for 
receiving said first clock signal, a clock input for 
receiving said first clock signal; and an output; 3Q 

a second latch having a data input coupled to said first 
latch output, a clock input for receiving a gating signal, 
a clock input for receiving said first clock signal, and an 
output; 

a third latch having a data input for receiving said data 35 
signal, an enable input for receiving a phase select 
signal, a clock input for receiving said first clock signal, 
and an output; and 

a multiplexer having a first data input coupled to said 
second latch output, a second data input coupled to said 
third latch output, and a selector input for selecting one 40 
of said first data input and said second data input for 
transfer to an output of said multiplexer. 

2. The interface circuit set forth in claim 1 wherein said 
second clock signal and said first clock signal are derived 
from a common core clock. 45 

3. The interface circuit set forth in claim 2 wherein a 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of N:l where N is an 
integer. 

4. The interface circuit set forth in claim 3 wherein a 50 
selection signal applied to the selector input selects said first 
data input of said multiplexer when a rising edge of said first 
clock signal is approximately in phase with a rising edge of 
said second clock signal. 

5. The interface circuit set forth in claim 2 wherein a 55 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of (N+2):l where N is an 
integer. 

6. The interface circuit set forth in claim 5 wherein a 
selection signal applied to the selector input selects said first 60 
data input of said multiplexer during one clock cycle of said 
second clock signal. 

7. An interface circuit for synchronizing the transfer of 
data from an output of a state machine in a first clock domain 
driven by a first clock signal to a second clock domain 65 
driven by a second clock signal, the interface circuit com- 
prising: 



,946 Bl 

10 

a first latch having a data input for receiving said state 
machine output, an enable input that is set to an enabled 
value, and an output; and 

a second latch having a data input coupled to said first 
latch output, an enable input for receiving a gating 
signal, a clock input for receiving said first clock signal, 
and an output coupled to an input of said state machine. 

8. The interface circuit set forth in claim 7 wherein said 
second clock signal and said first clock signal are derived 
from a common core clock. 

9. The interface circuit set forth in claim 8 wherein a 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of N:l where N is an 
integer. 

10. The interface circuit set forth in claim 8 wherein a 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of (N+2):l where N is an 
integer. 

11. A computer system comprising: 

a pipelined, x86-compatible processor having dual integer 
and dual floating point execution units, separate load/ 
store and branch units, an LI instruction cache and an 
LI data cache; 

system memory for storing data or instructions; 

a core clock; and 

an interface circuit for synchronizing the transfer of data 
through an output port from a first clock domain driven 
by a first clock signal to a second clock domain driven 
by a second clock signal, the interface circuit compris- 
ing: 

a first latch having a data input for receiving a data 
signal from said first clock domain, a clock input for 
receiving said first clock signal, an enable input that 
is set to an enabled value and an output; 

a second latch having a data input coupled to said first 
latch output, an enable input for receiving a gating 
signal, a clock input for receiving said first clock 
signal, and an output; 

a third latch having a data input for receiving said data 
signal, a enable input for receiving a phase select 
signal, a clock input for receiving said first clock 
signal, and an output; and 

a multiplexer having a first data input coupled to said 
second latch output, a second data input coupled to 
said third latch output, and a selector input for 
selecting one of said first data input and said second 
data input for transfer to an output of said multi- 
plexer. 

12. The computer system set forth in claim 11 wherein 
said second clock signal and said first clock signal are 
derived from said core clock. 

13. The computer system set forth in claim 12 wherein a 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of N:l where N is an 
integer. 

14. The computer system set forth in claim 13 wherein a 
selection signal applied to the selector input selects said first 
data input of said multiplexer when a rising edge of said first 
clock signal is approximately in phase with a rising edge of 
said second clock signal. 

15. The computer system set forth in claim 12 wherein a 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of (N+2):l where N is an 
integer. 

16. The computer system set forth in claim 15 wherein a 
selection signal applied to the selector input selects said first 
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data input of said multiplexer during one clock cycle of said 
second clock signal. 

17. A computer system comprising: 
a pipelined, x86-compatible processor having dual integer 
and dual floating point execution units, separate load/ 5 
store and branch units, an LI instruction cache and an 
LI data cache; 
system memory for storing data or instructions; 
a core clock; and 

an interface circuit for synchronizing the transfer of data 
from an output of a state machine in a first clock 
domain driven by a first clock signal to a second clock 
domain driven by a second clock signal, the interface 
circuit comprising: 1S 
a first latch having a data input for receiving said state 
machine output, a clock input for receiving said first 
clock signal, an enable input set to an enabled value, 
and an output; and 
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a second latch having a data input coupled to said first 
latch output, an enable input for receiving a gating 
signal, a clock input for receiving said first clock 
signal, and an output coupled to an input of said state 
machine. 

18. The computer system set forth in claim 17 wherein 
said second clock signal and said first clock signal are 
derived from a common core clock. 

19. The computer system set forth in claim 18 wherein a 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of N:l where N is an 
integer. 

20. The computer system set forth in claim 18 wherein a 
frequency of said second clock signal and a frequency of 
said first clock signal are in a ratio of (N+2):l where N is an 
integer. 

***** 
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